76 research outputs found

    Search for expertise : going beyond direct evidence

    Get PDF
    The automatic search for knowledgeable people in the scope of an organization\ud is a key function which makes modern Enterprise search systems\ud commercially successful and socially demanded. A number of effective approaches\ud to expert finding were recently proposed in academic publications.\ud Although, most of them use reasonably defined measures of personal expertise,\ud they often limit themselves to rather unrealistic and sometimes oversimplified\ud principles. In this thesis, we explore several ways to go beyond\ud state-of-the-art assumptions used in research on expert finding and propose\ud several novel solutions for this and related tasks.\ud First, we describe measures of expertise that do not assume independent\ud occurrence of terms and persons in a document what makes them perform\ud better than the measures based on independence of all entities in a document.\ud One of these measures makes persons central to the process of terms generation\ud in a document. Another one assumes that the position of the personā€™s\ud mention in a document with respect to the positions of query terms indicates\ud the relation of the person to the documentā€™s relevant content. Second,\ud we find the ways to use not only direct expertise evidence for a person concentrated\ud within the document space of the personā€™s current employer and\ud only within those organizational documents that mention the person. We\ud successfully utilize the predicting potential of additional indirect expertise\ud evidence publicly available on the Web and in the organizational documents\ud implicitly related to a person. Finally, besides the expert finding methods we\ud proposed, we also demonstrate solutions for the tasks from related domains.\ud In one case, we use several algorithms of multi-step relevance propagation to\ud search for typed entities in Wikipedia. In another case, we suggest generic\ud methods for placing photos uploaded to Flickr on the World map using language\ud models of locations built entirely on the annotations provided by users\ud with a few task specific extensions

    University of Twente at the TREC 2007 Enterprise Track : modeling relevance propagation for the expert search task

    Get PDF
    This paper describes several approaches which we used for the expert search task of the TREC 2007 Enterprise track.\ud We studied several methods of relevance propagation from documents to related candidate experts. Instead of one-step propagation from documents to directly related candidates, used by many systems in the previous years, we do not limit the relevance flow and disseminate it further through mutual documents-candidates connections. We model relevance propagation using random walk principles, or in formal terms, discrete Markov processes. We experiment with\ud innite and nite number of propagation steps. We also demonstrate how additional information, namely hyperlinks among documents, organizational structure of the enterprise and relevance feedback may be utilized by the presented techniques

    The search for expertise: to the documents and beyond

    Get PDF

    University of Twente at the TREC 2008 Enterprise Track: using the Global Web as an expertise evidence source

    Get PDF
    This paper describes the details of our participation in expert search task of the TREC 2007 Enterprise track.\ud This is the fourth (and the last) year of TREC 2007 Enterprise Track and the second year the University of Twente (Database group) submitted runs for the expert nding task. In the methods that were used to produce these runs, we mostly rely on the predicting potential of those expertise evidence sources that are publicly available on the Global Web, but not hosted at the website of the organization under study (CSIRO). This paper describes the follow-up studies\ud complimentary to our recent research [8] that demonstrated how taking the web factor seriously signicantly improves the performance of expert nding in the enterprise

    Intent Models for Contextualising and Diversifying Query Suggestions

    Full text link
    The query suggestion or auto-completion mechanisms help users to type less while interacting with a search engine. A basic approach that ranks suggestions according to their frequency in the query logs is suboptimal. Firstly, many candidate queries with the same prefix can be removed as redundant. Secondly, the suggestions can also be personalised based on the user's context. These two directions to improve the aforementioned mechanisms' quality can be in opposition: while the latter aims to promote suggestions that address search intents that a user is likely to have, the former aims to diversify the suggestions to cover as many intents as possible. We introduce a contextualisation framework that utilises a short-term context using the user's behaviour within the current search session, such as the previous query, the documents examined, and the candidate query suggestions that the user has discarded. This short-term context is used to contextualise and diversify the ranking of query suggestions, by modelling the user's information need as a mixture of intent-specific user models. The evaluation is performed offline on a set of approximately 1.0M test user sessions. Our results suggest that the proposed approach significantly improves query suggestions compared to the baseline approach.Comment: A short version of this paper was presented at CIKM 201

    Generalized Team Draft Interleaving

    Get PDF
    Interleaving is an online evaluation method that compares two ranking functions by mixing their results and interpret- ing the users' click feedback. An important property of an interleaving method is its sensitivity, i.e. the ability to obtain reliable comparison outcomes with few user interac- tions. Several methods have been proposed so far to im- prove interleaving sensitivity, which can be roughly divided into two areas: (a) methods that optimize the credit assign- ment function (how the click feedback is interpreted), and (b) methods that achieve higher sensitivity by controlling the interleaving policy (how often a particular interleaved result page is shown). In this paper, we propose an interleaving framework that generalizes the previously studied interleaving methods in two aspects. First, it achieves a higher sensitivity by per- forming a joint data-driven optimization of the credit as- signment function and the interleaving policy. Second, we formulate the framework to be general w.r.t. the search do- main where the interleaving experiment is deployed, so that it can be applied in domains with grid-based presentation, such as image search. In order to simplify the optimization, we additionally introduce a stratifed estimate of the exper- iment outcome. This stratifcation is also useful on its own, as it reduces the variance of the outcome and thus increases the interleaving sensitivity. We perform an extensive experimental study using large- scale document and image search datasets obtained from a commercial search engine. The experiments show that our proposed framework achieves marked improvements in sensitivity over efective baselines on both datasets

    Automatic tagging and geotagging in video collections and communities

    Get PDF
    Automatically generated tags and geotags hold great promise to improve access to video collections and online communi- ties. We overview three tasks offered in the MediaEval 2010 benchmarking initiative, for each, describing its use scenario, definition and the data set released. For each task, a reference algorithm is presented that was used within MediaEval 2010 and comments are included on lessons learned. The Tagging Task, Professional involves automatically matching episodes in a collection of Dutch television with subject labels drawn from the keyword thesaurus used by the archive staff. The Tagging Task, Wild Wild Web involves automatically predicting the tags that are assigned by users to their online videos. Finally, the Placing Task requires automatically assigning geo-coordinates to videos. The specification of each task admits the use of the full range of available information including user-generated metadata, speech recognition transcripts, audio, and visual features

    Riemannian Optimization for Skip-Gram Negative Sampling

    Full text link
    Skip-Gram Negative Sampling (SGNS) word embedding model, well known by its implementation in "word2vec" software, is usually optimized by stochastic gradient descent. However, the optimization of SGNS objective can be viewed as a problem of searching for a good matrix with the low-rank constraint. The most standard way to solve this type of problems is to apply Riemannian optimization framework to optimize the SGNS objective over the manifold of required low-rank matrices. In this paper, we propose an algorithm that optimizes SGNS objective using Riemannian optimization and demonstrates its superiority over popular competitors, such as the original method to train SGNS and SVD over SPPMI matrix.Comment: 9 pages, 4 figures, ACL 201
    • ā€¦
    corecore